09/05/2022
The Research Collaboratory for Structural Bioinformatics - Protein Data Bank (RCSB-PDB)
PDB Statitics - an example of data distribution found on https://www.rcsb.org/stats/
From raw data to visualizations
pdb_taxa_mol <- taxonomy_df %>% group_by(SUPERKINGDOM, `MOLECULE TYPE`) %>% add_tally(name = "n") %>% distinct(SUPERKINGDOM, `MOLECULE TYPE`, n)
pdb_entries_aug %>% select(IDCODE, RESOLUTION, `EXPERIMENT TYPE`) %>% filter(`EXPERIMENT TYPE` %in% exp_type_levels)
Successfully improved the PDB metadata visualizations
Database updates compromise reproducibility
Greatest challenge: combining files from different sources
Further analysis to account for redundancy.
Pie charts: https://www.rcsb.org/stats/